Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp: refactor (local) executors to start using process manager #7084

Merged
merged 3 commits into from
Dec 15, 2021

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Dec 2, 2021

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

This does not change any user-facing DVC behavior, other than exp pidfile location (which was undocumented/for internal use only)

related to: #6440
prerequisite for: #6267

Currently in the experiments API, we have a clear separation between queuing/staging an experiment (generating a stash commit with the desired changes), and execution/reproduction. The problem right now is that "execution" includes other operations that should not be part of the "execution" step. Essentially both populating the temporary workspace and calling Repo.reproduce must always be done as a single operation right now. This is OK in the current local execution workflow, but does not work when we get into the remote scenario - populating the workspace is a push operation that must happen on the local machine, and repro'ing the experiment in that workspace is an operation that happens on the remote machine.

This refactor cleans up the exp executors API so that there is a clearer separation between operations that are not actually dependent on each other, and will allow us to serialize the executor state so that we can do different operations in separate DVC commands/processes when needed.

  • introduce ExecutorManager class
  • move executor init/execution/collection from experiments.__init__ into the
    manager classes
  • make executors serializable via ExecutorInfo
  • move workspace repro into (managed) WorkspaceExecutor
  • remove old exp run "pidfile" behavior in favor of serializing ExecutorInfo within process manager subdirs (alongside where process manager pidfiles/logs/etc are stored)

@pmrowla pmrowla added the A: executors Related to the executors feature label Dec 2, 2021
@pmrowla pmrowla self-assigned this Dec 2, 2021
@pmrowla pmrowla force-pushed the exp-executor-proc branch 2 times, most recently from 66fe6fe to b08cd59 Compare December 6, 2021 07:00
- introduce ExecutorManager classes
- move executor init + execution from experiments.__init__ into the
  manager classes
- make executors serializable via ExecutorInfo
@pmrowla pmrowla force-pushed the exp-executor-proc branch 2 times, most recently from 74c7a45 to 9d7dc74 Compare December 14, 2021 06:50
Comment on lines +296 to +314
def exec_queue(self, jobs: Optional[int] = 1, detach: bool = False):
"""Run a single WorkspaceExecutor.

Workspace execution is done within the main DVC process
(rather than in multiprocessing context)
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could get rid of this entirely and just use WorkspaceExecutor inside the generic base multiprocessing context, but for now keeping the workspace specific behavior at least makes it easier to test exp reproduction (since we don't have to deal with testing function calls from inside multiprocessing child workers).

manager = Manager()
pid_q = manager.Queue()

with ProcessPoolExecutor(max_workers=jobs) as workers:
Copy link
Contributor Author

@pmrowla pmrowla Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should eventually go away entirely in favor of having "attached" execution done by ProcessManager (or rather via an actual task queue that works on top of ProcessManager).

In particular, we should not need to be handling multiprocessing worker signals/PIDs ourselves in exp executor code (it should all be abstracted away by the process manager).

@pmrowla pmrowla marked this pull request as ready for review December 14, 2021 07:42
@pmrowla pmrowla requested a review from a team as a code owner December 14, 2021 07:42
@pmrowla pmrowla requested a review from pared December 14, 2021 07:42
@pmrowla pmrowla changed the title [WIP] exp: refactor (local) executors to use process manager exp: refactor (local) executors to start using process manager Dec 14, 2021
@pmrowla pmrowla requested review from karajan1001 and removed request for pared December 14, 2021 07:44
@pmrowla pmrowla added skip-changelog Skips changelog refactoring Factoring and re-factoring labels Dec 14, 2021
- partially refactor existing pidfile behavior into using proc manager
- old pidfiles are now json-serialized executor info files placed in
  proc manager subdirs (alongside proc manager pidfile/logs/etc)
- move multiprocessing based (attached) execution into separate
  execution path from future "detached" execution support
@pmrowla
Copy link
Contributor Author

pmrowla commented Dec 14, 2021

POC demo of features that are now possible for us after this PR:
asciicast

Copy link
Contributor

@karajan1001 karajan1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmrowla pmrowla merged commit 25ef812 into iterative:master Dec 15, 2021
@pmrowla pmrowla deleted the exp-executor-proc branch December 15, 2021 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: executors Related to the executors feature refactoring Factoring and re-factoring skip-changelog Skips changelog
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants